Skip to content

Conversation

davepacheco
Copy link
Collaborator

This is the first part of #8859. This PR adds the logic to keep track of this. Once we have db_metadata_nexus records (currently #8845), the last bit of 8859 will be to update those records whenever this value changes.

This is still a work in progress. I need to add some new tests and also put this into omdb.

@davepacheco
Copy link
Collaborator Author

Some sample output from the new omdb, run against cargo xtask omicron-dev run-all:

Initial state:

$ cargo run --bin=omdb -- --dns-server=[::1]:64971 nexus quiesce show
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 1.17s
     Running `target/debug/omdb '--dns-server=[::1]:64971' nexus quiesce show`
note: Nexus URL not specified.  Will pick one from DNS.
note: using Nexus URL http://[::1]:12221
running normally (not quiesced, not quiescing)
saga quiesce:
    new sagas: Allowed
    drained as of blueprint: none
    blueprint for last recovery pass: none
    blueprint for last reassignment pass: none
    reassignment generation: 1 (pass running: no)
    recovered generation: 1
    recovered at least once successfully: yes
    sagas running: 0
database connections held: 0

Enabled blueprint execution:

$ ./target/debug/omdb --dns-server=[::1]:64971 nexus blueprints target enable current -w
note: Nexus URL not specified.  Will pick one from DNS.
note: using Nexus URL http://[::1]:12221
set target blueprint c144655e-ab2e-4a4e-aa21-b7f1f70e4620 to enabled

$ cargo run --bin=omdb -- --dns-server=[::1]:64971 nexus quiesce show
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 1.14s
     Running `target/debug/omdb '--dns-server=[::1]:64971' nexus quiesce show`
note: Nexus URL not specified.  Will pick one from DNS.
note: using Nexus URL http://[::1]:12221
running normally (not quiesced, not quiescing)
saga quiesce:
    new sagas: Allowed
    drained as of blueprint: none
    blueprint for last recovery pass: c144655e-ab2e-4a4e-aa21-b7f1f70e4620
    blueprint for last reassignment pass: c144655e-ab2e-4a4e-aa21-b7f1f70e4620
    reassignment generation: 1 (pass running: no)
    recovered generation: 1
    recovered at least once successfully: yes
    sagas running: 0
database connections held: 0

Created a demo saga:

$ cargo run --bin=omdb -- --dns-server=[::1]:64971 nexus quiesce show
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 1.13s
     Running `target/debug/omdb '--dns-server=[::1]:64971' nexus quiesce show`
note: Nexus URL not specified.  Will pick one from DNS.
note: using Nexus URL http://[::1]:12221
running normally (not quiesced, not quiescing)
saga quiesce:
    new sagas: Allowed
    drained as of blueprint: none
    blueprint for last recovery pass: c144655e-ab2e-4a4e-aa21-b7f1f70e4620
    blueprint for last reassignment pass: c144655e-ab2e-4a4e-aa21-b7f1f70e4620
    reassignment generation: 1 (pass running: no)
    recovered generation: 1
    recovered at least once successfully: yes
    sagas running: 1
        saga 2bb35305-bd26-4476-b259-718bdb20b53a pending since 2025-08-27T22:40:42.260Z (demo)
database connections held: 0

Start quiescing:

$ cargo run --bin=omdb -- --dns-server=[::1]:64971 nexus quiesce start -w
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 1.14s
     Running `target/debug/omdb '--dns-server=[::1]:64971' nexus quiesce start -w`
note: Nexus URL not specified.  Will pick one from DNS.
note: using Nexus URL http://[::1]:12221
quiescing since 2025-08-27T22:40:58.983Z (0s ago)
details: waiting for running sagas to finish
saga quiesce:
    new sagas: DisallowedQuiesce
    drained as of blueprint: none
    blueprint for last recovery pass: c144655e-ab2e-4a4e-aa21-b7f1f70e4620
    blueprint for last reassignment pass: c144655e-ab2e-4a4e-aa21-b7f1f70e4620
    reassignment generation: 1 (pass running: no)
    recovered generation: 1
    recovered at least once successfully: yes
    sagas running: 1
        saga 2bb35305-bd26-4476-b259-718bdb20b53a pending since 2025-08-27T22:40:42.260Z (demo)
database connections held: 0

Complete the demo saga:

$ cargo run --bin=omdb -- --dns-server=[::1]:64971 nexus quiesce show
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 1.15s
     Running `target/debug/omdb '--dns-server=[::1]:64971' nexus quiesce show`
note: Nexus URL not specified.  Will pick one from DNS.
note: using Nexus URL http://[::1]:12221
quiesced since 2025-08-27T22:41:18.609Z (5s 411ms ago)
    waiting for sagas took 19s 626ms
    waiting for db quiesce took 0s
    recording quiesce took 0s
    total quiesce time: 19s 626ms
saga quiesce:
    new sagas: DisallowedQuiesce
    drained as of blueprint: c144655e-ab2e-4a4e-aa21-b7f1f70e4620
    blueprint for last recovery pass: c144655e-ab2e-4a4e-aa21-b7f1f70e4620
    blueprint for last reassignment pass: c144655e-ab2e-4a4e-aa21-b7f1f70e4620
    reassignment generation: 1 (pass running: no)
    recovered generation: 1
    recovered at least once successfully: yes
    sagas running: 0
database connections held: 0

@davepacheco davepacheco marked this pull request as ready for review August 27, 2025 22:44
@davepacheco davepacheco requested a review from jgallagher August 27, 2025 22:45
/// whether a saga recovery operation is ongoing, and if one is:
/// - what `reassignment_generation` was when it started
/// - which blueprint id we'll be fully caught up to upon completion
#[serde(skip)] // XXX-dap
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

XXX here because we don't want to skip this field?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes -- sorry I missed that! This is a problem because we don't support a tuple in this context in the OpenAPI spec. I will replace it with a struct.

}
};

q.latch_drained_blueprint_id();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it correct to latch this even if quiescing is false?

Copy link
Collaborator Author

@davepacheco davepacheco Aug 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the function checks that.

edit: to be precise, it is not correct to latch the value in this case. The function latch_drained_blueprint_id is intended to be called at any time and will only latch the state if appropriate, and it checks that. Is there a better name for that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

latch_blueprint_id_if_drained() maybe?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants